-
-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: OpenAI Ingredient Parsing #3581
feat: OpenAI Ingredient Parsing #3581
Conversation
a302c75
to
5796c1f
Compare
I've only skimmed it, this is not a definitive review! Is it possible to make the endpoint configurable? I've not across all the details but I believe a lot of the self hosted LLM projects have "OpenAI compatible" endpoints. It would be great if we can easily support those as well, particularly given Mealie is in the the same self hosting space. (Obviously, I'd be happy for any change here to be a subsequent PR) |
It might be possible, but it would require a lot of extra work for a few reasons, which I've stayed away from in this PR:
|
Can you elaborate on what you mean by "JSON response unique to OpenAI"? I believe what boc is saying is projects like ollama have an OpenAI-compatible API, allowing it to act as a drop-in replacement to the endpoint in the OpenAI library |
Specifically OpenAI's JSON mode: https://platform.openai.com/docs/guides/text-generation/json-mode
I didn't realize that works even with the OpenAI library, that's super nice. Looks like we can just make the OpenAI base URL customizable and enable this. What I wanted to avoid was writing a custom client to interact with OpenAI (since it's a lot to maintain and really out of scope of Mealie) |
if i understand this correctly the input is still the meta data from a website in a open recipe format. right? because i am parsing a websites html with gpt as not every website has the recipe in a structured format. i stumbled across multiple examples where the recipe is only available in the text/html and chatgpt needs to intelligently parse it to a json. works very good actually |
Correct, this PR is not for scraping websites and generating recipes. This is for recipes that have already been imported, but their ingredients are not yet parsed. However, I do have plans to support alternative import methods using OpenAI, building off of the foundation of this PR. Theoretically we can fall back to parsing a website with OpenAI when recipe metadata isn't available. |
There are a few different discussions that I think are great ways to apply this to other areas of Mealie later down the road |
Does this required a paid OpenAI account? The default model is |
You may use any LLM that has an OpenAI-compatible API. For instance, see ollama posted above. You just need to specify your own I've only tested with gpt-4 (and its variants) so I can only confirm that those work, however it's fully configurable per-instance. I will say that with gpt-4 you blow through the free tier extremely quickly. I've built in some measures to reduce costs as well as give some configurability to trade off speed vs cost. With gpt-4o it seems to cost 5-10 cents per parsed recipe (with 2 workers and ~10 ingredients) |
Is there a reason to prefer a more powerful and more expensive model than 3.5-turbo ($0.50/1M tokens) as the default? |
Short answer: No not really, but the default hardly matters when it doesn't work out of the box anyway; at a minimum you need to supply an API key, so there's nothing stopping you from also setting the model. Longer answer: I've had a lot more success with GPT-4 when it comes to anything other than conversational interaction. GPT-3.5 is also a lot more moody when it comes to following prompts. GPT-4 is also much better at parsing non-english languages, which is particularly important for a parser that needs to understand grammar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I particularly like the introduction of parserLoading
.
Let's get this in front of people and see what feedback comes through!
What type of PR is this?
(REQUIRED)
What this PR does / why we need it:
(REQUIRED)
This PR opens the door to implementing OpenAI in Mealie, and implements a new OpenAI ingredient parser. At a high level, this adds an OpenAI service that manages stored prompts and data injection to call the OpenAI API and receive a JSON response (which we then parse into a Pydantic model).
To enable OpenAI features, users need to include their OpenAI API key in the backend config (using the
OPENAI_API_KEY
env var). There are a few other configuration options to tweak performance vs cost (since the API isn't free).Since OpenAI configuration is done via environment variables, this doesn't require any DB migrations.
The way this works is we have stored prompts which get sent to OpenAI to instruct it on what to do, i.e. "You are a bot designed to parse ingredients for recipes" (the actual prompt is much longer and goes into far more detail). It then sends a JSON list of inputs as the user message for it to process.
The OpenAI API supports returning its response in JSON format, which is perfect for FastAPI/Pydantic validation. I used Pydantic's
BaseModel.model_dump_json()
to inject the expected response schema into the prompt, which makes GPT always respond in a parsable format.From there, implementing an interface is simple:
Our OpenAI service handles the prompt injection, additional data injection (see below), and API handling, you just need to provide it the data and a description of how to use the data.
For the parser I opted to serialize our unit store and send it along with the rest of the prompt. This gives GPT some training data to say "you should expect to see these units". Originally I also included foods, but it didn't seem to help much at all (and adding the entire food store racks up API costs. This is configurable in the env settings: if you want to reduce costs, you can skip the optional data injection.
The OpenAI API isn't very fast when the responses are long. I took a bunch of measures to optimize this, but you can also split the ingredients into chunks and send multiple async requests (one for each chunk). This speeds up the parse time considerably, but costs more. The worker count is configurable in the env settings.
This PR also adds some QoL features on the frontend for parsing ingredients. Namely:
I've also hid the OpenAI ingredient parser if OpenAI isn't enabled (i.e. you haven't provided an API key).
Which issue(s) this PR fixes:
(REQUIRED)
N/A, though it has been discussed on and off
Special notes for your reviewer:
(fill-in or delete this section)
The prompts (this one and future ones) will likely go through a bunch of iterations before we're in that "sweet spot" of how to get the best results out of GPT. Ideally in the future it will need to be optimized for newer models (we may even decide to have different prompts for different models), but this is why I specifically included an env var for the OpenAI model to use (so that we aren't forced to keep up with the rapidly evolving AI space), sort of like pinning a package version.
This opens up some exciting possibilities in the future, such as importing strange recipe sources (unstructured data, OCR, etc.).
Testing
(fill-in or delete this section)
You need an OpenAI API key to properly test this, but I added a mocked test just to confirm it works fine as long as we get data from OpenAI.